Instructions

The Set-up

Learning objectives.

  1. Import and use satellite images as raster files R.
  2. Learn the bases of raster-based analysis and manipulations in R.
  3. Implement and interpret Hierarchical clustering analysis in R.
  4. Implement and interpret Non-Hierarchical clustering analysis in R.

Before you start

The practical will be run using RStudio cloud to avoid individual machine troubleshooting. This means you need to have an account for RStudio cloud.

However, remember to have R and R-studio in your/room computer so you can work on the project after class.

You should have also 1. Read chapter 3 and 4 of Borcard et al (2018) Numerical Ecology book.

The way it’s going to be run

In this 3h practical you will to implement what you have learned in the class about Classification in R.

The practical will be run using RStudio cloud to avoid individual machine troubleshooting. This means you need to have an account for RStudio cloud.

However, remember to have R (https://cran.r-project.org) and R-studio (http://www.rstudio.com/) in your/room computer so you can work on the project after class.

One important thing, this time around you have to create your own R-markdown file. The reason is so you start getting you ready for puting together the final project.

The Setup

During the next few weeks you will work with satellite information data coming from the Landsat-8 program (https://www.usgs.gov/landsat-missions/landsat-8). One of my goals is to give you hands-on experience with the type of anlyses you can do with the information coming from remote sensors. However, the tutorial will NOT teach you how to get this data - I did so on week-9 Tutorial/Practical.

Assessment.

Submit your knitted R-markdown file (the HTML) via BrightSpace - Before the next practical! The assessment is a Pass/Fail based on how you write and annotate the code - You need to show us you know what you are doing and NOT copying someone’s code.

Before you start.

Identifying the suitable satellite images for a specific project remains a challenging task, as ` there are numerous sources of remotely sensed data from satellites. Exceptionally high spatial resolution data is available as (costly) commercial products (for an example, see https://www.planet.com). Lower spatial resolution data is freely available from NASA, ESA, and other organisations. In this practical, rather than focusing on identifying the most adequate satellite imagery, you will use freely available from the Landsat-8 program. For this practical, you will use a cloud-free composite image from Landsat-8 with eleven bands. All Landsat image scenes have a unique product ID and metadata. You can find the information on Landsat sensor, satellite, location on Earth (WRS path, WRS row) and acquisition date from the product ID.

Unsupervised Classification of satellite images

Data exploration and visualisation

Your first goal here will be to access and explore the provided satellite images with R.

The first step, like always, is to load the satellite image data into R. For this, you will use the raster or terra packages. To ensure you have it on your computer, install it using either the GUI or type the instruction install.packages("raster", dependencies ==T) or install.packages("terra", dependencies ==T) in the console.

Once you know you have the raster/terra packages on your computer, you need to load it using the library() function.

# Load the required package (raster)
library("raster")

With this, you are ready to start the analysis of Landsat-8 satellite information.

The first step is to load the Landsat image contained in the file Aarhus_2019_utm_L8.tif into R. Before you do so, it is essential to know that Aarhus_2019_utm_L8.tif is a geoTIFF image. These types of files are different from a regular image as it has associated geospatial information (e.g., projection, coordinate systems, ellipsoids, datums - remember your GIS course).

For loading satellite images into R, you need to use the function raster, which is part of the raster R-package. Remember that if you need help using the function raster, you can type and execute ?raster.

Your task:

Load the Aarhus_2019_utm_L8.tif, located in located in R-Studio cloud data [see the Files tab in the lower right window] as an object named landsat.1BND.

Then print the landsat.1BND object.

# Use `raster()` to load the file and save it as an object named landsat.1BND.
landsat.1BND <- raster("./Data/Aarhus_2019_utm_L8.tif")

# Print the output of the landsat.1BND object.
landsat.1BND
## class      : RasterLayer 
## band       : 1  (of  7  bands)
## dimensions : 332, 493, 163676  (nrow, ncol, ncell)
## resolution : 30, 30  (x, y)
## extent     : 564270, 579060, 6219660, 6229620  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs 
## source     : Aarhus_2019_utm_L8.tif 
## names      : layer 
## values     : 0, 0.347745  (min, max)

Note that the code above only loads a single Landsat layer (band). To load all the layers simultaneously, you should use the stack() function. This creates a RasterStack that you could consider as an analogue to an array.

Your task:

Load ALL the bands from the Aarhus_2019_utm_L8.tif, located in located in R-Studio cloud data [see the Files tab in the lower right window] as an object named landsat8.

Then print the landsat8 object.

# Use `stack()` to load the file and save it as an object named landsat8.
landsat8 <- stack("./Data/Aarhus_2019_utm_L8.tif")

# Print the output of the landsat8 object.
landsat8
## class      : RasterStack 
## dimensions : 332, 493, 163676, 7  (nrow, ncol, ncell, nlayers)
## resolution : 30, 30  (x, y)
## extent     : 564270, 579060, 6219660, 6229620  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs 
## names      :   layer.1,   layer.2,   layer.3,   layer.4,   layer.5,   layer.6,   layer.7 
## min values :         0,         0,         0,         0,         0,         0,         0 
## max values : 0.3477450, 0.3674900, 0.4233425, 0.4276600, 0.5316375, 0.5013050, 0.4579925

Printing the landsat8 object shows you that this is a RasterLayer with seven (7) layers (nlayers) that correspond to different bands of the light spectrum:

  • Band 1 Coastal Aerosol (0.43 - 0.45 µm).

  • Band 2 Blue (0.450 - 0.51 µm).

  • Band 3 Green (0.53 - 0.59 µm).

  • Band 4 Red (0.64 - 0.67 µm).

  • Band 5 Near-Infrared (0.85 - 0.88 µm).

  • Band 6 SWIR 1(1.57 - 1.65 µm).

  • Band 7 SWIR 2 (2.11 - 2.29 µm).

Your task:

Using the names() function, Now change the names of the bands in landsat8 as: Band 1 = “Coastal”; Band 2 = “Blue”; Band 3 = “Green”; Band 4 = “Red”; Band 5 = “NIR”; Band 6 = “SWIR11”; Band 6 = “SWIR12”.

This process is the same as you will do with a vector/data.frame where you define the names as names(Vact.ob) <- c('a', 'b', 'c'):

# Print the current names of landsat8.
names(landsat8)
## [1] "layer.1" "layer.2" "layer.3" "layer.4" "layer.5" "layer.6" "layer.7"
# Change the names of landsat8 using the names() function.
names(landsat8) <- c("Coastal", "blue", "green", "red", "NIR", "SWIR1", "SWIR2")

# Print the new names of landsat8.
names(landsat8)
## [1] "Coastal" "blue"    "green"   "red"     "NIR"     "SWIR1"   "SWIR2"

Ploting raster objects

With the satellite information saved as an object (landsat8), you can now plot it to see what is represented here. For this, you will use the function plot and specify the band of interest as you would in an object of the class list (that is, using double square-brackets rastFile[[nlayer]]). For example, to plot the ‘Blue’ band in landsat8 you will execute plot(landsat8[["blue"]]) or plot(landsat8[[1]].

Your task:

Plot each of the landsat8 layers individually. That is, use the plot() function once each time for the bands.

# Set a plotting space of 3-rows and 2-columns
par(mfcol = c(3, 3))

# Plot the first band - using the "text string" to define the band to plot
plot(landsat8[["Coastal"]])

# Plot the second band - using the "text string" to define the band to plot
plot(landsat8[["blue"]])

# Plot the third band - using the "text string" to define the band to plot
plot(landsat8[["green"]])

# Plot the fourth band - using the "layer number" to define the band to plot
plot(landsat8[["red"]])

# Plot the fifth band - using the "layer number" to define the band to plot
plot(landsat8[[5]])

# Plot the sixth band - using the "layer number" to define the band to plot
plot(landsat8[[6]])

# Plot the sixth band - using the "layer number" to define the band to plot
plot(landsat8[[7]])

The approach above is cumbersome and clumsy but give you a sense of how to plot individual layers. You could have done the same as above by running plot(landsat8). This can also be implemented on a selection of the RasterStack layers by selecting which ones to plot inside the double square brackets (rastFile[[nlayer1:nlayer2]]).

Your task:

Plot four (4) layers of the RasterStack (the RGB+NRI spectrum) in a single visualization panel using the plot() function.

Also, explore changing the color scheme of your plot by using the argument col. Here you will use two color schemes:

  • A grey color scheme.

  • A Red-Yellow-Blue (RdYlBu) color scheme using a hue-chroma-luminance (HCL) colours.

par(oma = c(1, 1, 3, 1))
# Plot the RGB spectrum using a grey colour scheme.
plot(
  x = landsat8[[2:5]], # Which raster object, or layers of it should be plotted?
  main = names(landsat8[[1:4]]), # Define the title for each panel.
  col = gray(0:100 / 100) # Define the colour scheme.
)

# Give the plot a title.
mtext(
  text = "RGB+NRI spectrum - Grey scale", # The main title text
  side = 3, # On which side of the plot should the title be placed? (1=bottom, 2=left, 3=top, 4=right).
  outer = T, # Place the title in the outer margin region?
  cex = 2, # Define the text size.
  line = 1 # Define the line location in the outer margin region.
)

# Plot the RGB spectrum using a Red-Yellow-Blue colour scheme.
plot(
  x = landsat8[[1:4]], # Which raster object or layers of it should be plotted?
  main = names(landsat8[[1:4]]), # Define the title for each panel.
  col = hcl.colors(100, palette = "RdYlBu") # Define the colour scheme.
)
# Give the plot a title.
mtext(
  text = "RGB+NRI spectrum - HCL scale",
  side = 3, # On which side of the plot should the title be placed? (1=bottom, 2=left, 3=top, 4=right).
  outer = T, # Place the title in the outer margin region?
  cex = 2, # Define the text size.
  line = 1 # Define the line location in the outer margin region.
)

The difference in shading and range of legends between the different bands is because various surface features reflect the incident solar radiation differently.

Each layer represents how much incident solar radiation is reflected for a particular wavelength range. For example, vegetation reflects more energy in NIR than other wavelengths and thus appears brighter. In contrast, water absorbs most of the energy in the NIR wavelength, and it seems dark.

Your task:

Now plot the four (4) layers of the RasterStack representing the RGB+NRI spectrum again. However, this time make sure that the range of each panel goes between 0 and 1.

To ensure the color ramps in all maps range between 0 and 1, use the argument zlim when plotting each raster. You can use either a grey or Red-Yellow-Blue colour scheme.

par(oma = c(1, 1, 3, 1))
# Plot the RGB spectrum using a grey colour scheme.
plot(
  x = landsat8[[1:4]], # Which raster object, or layers of it should be plotted?
  main = names(landsat8[[1:4]]), # Define the title for each panel.
  col = gray(0:100 / 100), # Define the colour scheme.
  zlim = c(0, 1) # set the upper/lower limits to the plotted values on each panel
)

# Give the plot a title.
mtext(
  text = "RGB+NRI spectrum - Grey scale", # The main title text
  side = 3, # On which side of the plot should the title be placed? (1=bottom, 2=left, 3=top, 4=right).
  outer = T, # Place the title in the outer margin region?
  cex = 2, # Define the text size.
  line = 01 # Define the line location in the outer margin region.
)

# Plot the RGB spectrum using a Red-Yellow-Blue colour scheme.
plot(
  x = landsat8[[1:4]], # Which raster object or layers of it should be plotted?
  main = names(landsat8[[1:4]]), # Define the title for each panel.
  col = hcl.colors(100, palette = "RdYlBu"), # Define the colour scheme.
  zlim = c(0, 1) # set the upper/lower limits to the plotted values on each panel
)
# Give the plot a title.
mtext(
  text = "RGB+NRI spectrum - HCL scale",
  side = 3, # On which side of the plot should the title be placed? (1=bottom, 2=left, 3=top, 4=right).
  outer = T, # Place the title in the outer margin region?
  cex = 2, # Define the text size.
  line = 1 # Define the line location in the outer margin region.
)

Combining bands: composite plots of raster objects

Satellite images are often combined into a “composite” to create plots that reveal more information. A simple composition is to merge them into a true (or natural) colour image. These images look similar to a standard photograph (vegetation in green, water blue, etc.). For this, you need bands in the red, green and blue regions. For this Landsat image, band-4 (red), band-3 (green), and band-2 (blue) can be used. You can use the plotRGB() function to combine them into a single composite.

Your task:

Now plot a true colour image of the region of interest using the plotRGB() function.

Be sure to plot these in the right order: red (band 4), green (band 3), and blue (band 2).

# Generate a temporary RasterStack with the layers in the right order Red-Green-Blue. Save it as an object named landsatRGB.
landsatRGB <- landsat8[[c("red", "green", "blue")]]
# OR
landsatRGB <- landsat8[[c(4, 3, 2)]]

# Use the plotRGB() function to make the true colour composites - these follow a RGB order
plotRGB(landsatRGB, # True colour composites flow a RGB order.
  r = 1, g = 2, b = 3, # Define which band is red, green, and blue.
  scale = 1, # defined the maximum value a band can take (used to scaling when colouring).
  main = "Landsat True Color Composite" # add a title to the figure
)

That last figure looks a bit dark. Luckily the plotRGB() function has a few tricks that can help you enhance your visualisation. By defining the stretch argument to either "lin" or "hist" you can stretch the values to increase the image’s contrast. These approaches redistribute the pixel values along the histogram for each displayed band in a linear fashion (lin) or by matching the brakes of a histogram (hist).

Your task:

Now plot a true colour image of the region of interest using the plotRGB() function specifying both a linear and histogram stretch

# Generate a temporary RasterStack with the layers in the right order Red-Green-Blue. Save it as an object named landsatRGB.
landsatRGB <- landsat8[[c("red", "green", "blue")]]
# OR
landsatRGB <- landsat8[[c(4, 3, 2)]]

# define the plotting space
par(
  mfrow = c(1, 2),
  mar = rep(1, 4)
)

# Use the plotRGB() function with a linear stretch to make the true colour composites - these follow an RGB order
plotRGB(landsatRGB, # True colour composites flow a RGB order.
  r = 1, g = 2, b = 3, # Define which band is red, green, and blue.
  scale = 1, # defined the maximum value a band can take (used to scaling when colouring).
  main = "Landsat True Color Composite\nlin stretch", # add a title to the figure.
  stretch = "lin", # How to stretch the values to increase the contrast?
  margins = TRUE
)

# Use the plotRGB() function with a  histogram stretch to make the true colour composites - these follow an RGB order
plotRGB(landsatRGB, # True colour composites flow a RGB order.
  r = 1, g = 2, b = 3, # Define which band is red, green, and blue.
  scale = 1, # defined the maximum value a band can take (used to scaling when colouring).
  main = "Landsat True Color Composite\nhist stretch", # add a title to the figure.
  stretch = "hist", # How to stretch the values to increase the contrast?
  margins = TRUE
)

These look better, and now it is easy to visualise different areas.

Another popular image visualisation method in remote sensing is known as “false colour” images, in which NIR, red, and green bands are combined in this order. This representation makes it easy to see the vegetation (in red), which is very popular in ecological analyses.

Your task:

Now plot a false colour image of the region of interest using the plotRGB() function.

Be sure to plot these in the right order: NIR (band 4), red (band 3), and green (band 2).

Here set the stretch argument to "linear" so the contrast improves.

Based on the false colour image, answer the question: Where do you see vegetation in this plot?

# Generate a temporary RasterStack with the layers in the right order Red-Green-Blue. Save it as an object named landsat.NriRG.
landsat.NriRG <- landsat8[[c("NIR", "red", "green")]]

par(mfrow = c(1, 1))

# Use the plotRGB() function with a linear stretch to make the false colour composites - these follow an RGB order
plotRGB(landsat.NriRG, # True colour composites flow an RGB order.
  r = 1, g = 2, b = 3, # Define which band is red, green, and blue.
  scale = 1, # defined the maximum value a band can take (used to scaling when colouring).
  main = "Landsat False Color Composite\nlin stretch", # add a title to the figure.
  stretch = "lin", # How to stretch the values to increase the contrast?
  margins = TRUE
)

Question: Where do you think youy can see vegetation in this plot? In the Upper part of the image, where the colour is dark and light red. Based on the true colour images, these areas can be agricultural areas.

Assessing the relation between bands

Looking at a scatter-plot matrix between bands can help explore relationships between raster layers. For example, you can assess these relations by plotting the reflection in one band against reflection in another (e.g., red vs NRI).

Your task:

Using the pairs() function of the raster package, you will plot the reflection of the red band on NRI.

# Plot the relation between the red on NRI layers in a RasterStack using the pairs() function.
pairs(
  x = landsat8[[4:5]], # RasterStack to plot.
  main = "Red versus NIR"
) # the title of the figure.

The relation between NIR and red is unique due to its triangular shape. Vegetation reflects highly in the NIR range than red and creates the upper corner close to the NIR (y) axis. Water absorbs energy from all the bands and occupies the location close to the origin. The furthest corner is created due to highly reflecting surface features like bright soil or concrete.

The raster package supports many mathematical operations. These operations are generally performed per pixel (grid cell). Map algebra with Raster objects flows the same rules of algebraic operation that R-objects do, but focuses on bands in a RasterStack (landsat8[[1]] + landsat8[[2]]) or individual RasterLayers (landsatFCC + landsatRGB).

Your task:

Subset the original landsat8 RasterStack into two new RasterStacks.

The FIRST, named landsatRGB, will consist of the red, green, and blue bands (in that order).

The SECOND is named landsatFCC and will consist of the NIR, red, green bands (in that order).

# Subset the original `landsat8` `RasterStack` into a raster stack consisting of the red, green, and blue bands. Name this object landsatRGB.
landsatRGB <- landsat8[[c(4, 3, 2)]]

# Subset the original `landsat8` `RasterStack` into a raster stack consisting of the NIR, red, green bands. Name this object landsatFCC
landsatFCC <- landsat8[[c(5, 4, 3)]]

Using the two RasterStacks generated above, you will now use “raster algebra” to generate a series of Vegetation indices.

In general terms, vegetation indices are dimensionless radiometric measures, which combine information from different channels of the electromagnetic spectrum to enhance the vegetation signal. They measure the spatial and temporal variation of the plant’s photosynthetic activity. Due to their simplicity, vegetation indices are widely used. The most commonly used vegetation indices include Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Difference Vegetation Index (DVI). Here you will focus on the Normalised Difference Vegetation Index (NDVI), which makes use of the red and near-infra-red bands since the energy reflected in these wavelengths is related to the amount of vegetation cover on the ground surface.

Specifically, NDVI is the difference in the amount of visible and near-infra-red light reflected from plants on Earth’s surface. Mathematically, this translates into \(NDVI=\frac{(NIR-Red)}{(NIR+Red)}\); meaning that by design, the NDVI itself varies between -1.0 and +1.0.

This information makes it possible to estimate NDVI using the landsat8 or landsatFCC RasterStacks.

Your task:

You will use the information in landsatFCC to estimate the evaluated region’s Normalised Difference Vegetation Index (NDVI).

# Extract the near-infra-red band.
NIRBnd <- landsatFCC[[1]]

# Extract the red band.
RedBnd <- landsatFCC[[2]]

# Estimate NDVI as NDVI=(NIR-Red)/(NIR+Red).
NDVI <- (NIRBnd - RedBnd) / (NIRBnd + RedBnd)


# Plot the NDVI.
plot(NDVI, # The NDVI Raster Stack.
  col = rev(terrain.colors(10)), # Define the colours .
  main = "Landsat-NDVI"
) # Give a Name.

# You can do this in one step also:
# NDVI in just one go using landsat8.
NDVI.1 <- (landsat8[[4]] - landsat8[[3]]) / (landsat8[[4]] + landsat8[[3]])

# NDVI in just one go using landsatFCC.
NDVI.2 <- (landsatFCC[[1]] - landsatFCC[[2]]) / (landsatFCC[[1]] + landsatFCC[[2]])

Just as with other R-objects, you can explore the distribution of values within our raster using the hist() function, which produces a histogram. Histograms are often useful in identifying outliers and lousy data values in our raster data.

Your task:

Plot the histogram of the NDVI RasterLayer object created above using the hist() function.

# Plot the histogram of NDVI using the hist() function.
hist(
  x = NDVI, # raster object.
  main = "Distribution of NDVI values", # Main Title.
  xlab = "NDVI", # x-axis label.
  ylab = "Frequency", # x-axis label.
  col = "wheat", # Bars colour.
  xlim = c(-1, 1), # Limit of the x-axis.
  breaks = 30, # into how many classes shod the data be divided.
)

You can apply basic rules to estimate the spatial extent of different Earth surface features based on the idea that NDVI values are standardized and ranges between -1 to +1. So high values are vegetation, intermediate areas might be open areas, and low values can be water.

Looking at the histogram, you could say that the highest values (values above 0.4) are vegetation. You can visualize this by mathematically masking the NDVI using the logical test.

Your task:

Using a logical create a mask of the NDVI RasterLayer so that it marks as true (or 1) the highest values (values above 0.4).

Use this mask to remove all areas that are not vegetated.

# Create a mask of the NDVI RasterLayer for values above 0.4. Name this raster raster NDVImask.
NDVImask <- NDVI >= .4

# plot the NDVImask.
plot(NDVImask)

# Using NDVImask and map algebra operations mask the non-vegetations values from the NDVI RasterLayer. Name this new object NDVIVeg.
NDVIVeg <- NDVI / NDVImask

# plot the NDVImask.
plot(NDVIVeg, # RasterLayer to plot.
  col = rev(terrain.colors(10)), # Set the colours.
  main = "Vegetation"
) # add a Main title.

# An advanced solution using the mask() function.
# Make the 0 values NA.
NDVImask[NDVImask[] == 0] <- NA
# Mask the non-vegetations values using the mask() function.
NDVIVeg2 <- mask(NDVI, NDVImask)
# plot the NDVImask.
plot(NDVIVeg2, # RasterLayer to plot.
  col = rev(terrain.colors(10)), # Set the colours.
  main = "Vegetation_v2"
) # add a Main title.

Question Which color do “vegetated areas (those the highest values) have?. Green.

Hierarchical clustering

In hierarchical clustering the idea is to generate a system of grouping according to a hierarchy, or levels and orders.In the field of machine learning, hierarchical classification is sometimes referred to as instance space decomposition,which splits a complete multi-class problem into a set of smaller classification problems.

Here you will first generate a series of hierarchical classifications, and then determine how good they are into generate five (5) unique clusters.

The first step to do your classification is to extract the NDVI raster and save these into a vector. This is because the classification function used to define classes work on a vector of values, NOT on the cells in a raster.

Your task:

Using the function getValues(), extract the values in the NDVI raster and save these into a vector.

To know how to use the function type ?getValues.

# Using the function `getValues()` extract the values in the `NDVI` into an object named NDVIVctv.
NDVIVct <- getValues(NDVI)

# Just to be on the safe side turn NA into -1
NDVIVct[is.na(NDVIVct)] <- -1

# Check the stricture of the vector with the raster data.
str(NDVIVct)
##  num [1:163676] 0.751 0.742 0.628 0.677 0.674 ...

Using the vector created with the test above, you can now perform a Hierarchical classification using four (4) different approaches:

  1. Single linkage.
  2. Complete linkage.
  3. Average linkage.
  4. Ward’s hierarchical clustering.

Your task:

Using the function hclust(), you will now generate a series of hierarchical classifications based on on the Euclidean distance of NDVI values. For this, you will:

  1. Create a object named NDVIVctRndSmp where you extract 100 values from NDVIVct.

  2. Create a Object names NDVI.Dist, based on the Euclidean distance of NDVI values in NDVIVctRndSmp. use the dist() function to estimate this distance.

  3. Use the function hclust() and the adequate method to create a

  • Single linkage classification.
  • Complete linkage classification..
  • Average linkage classification.
  • Ward’s hierarchical classification.
  1. Plot each of these classifications.
# It is important to set the seed generator for repeatability.
set.seed(25)
# Extract 100 values from NDVIVct
NDVIVctRndSmp <- sample(
  NDVIVct,
  100
)
# Estimate the Distance Matrix based on NDVI values
NDVI.Dist <- dist(NDVIVctRndSmp,
  method = "euclidean"
)

# Single linkage classification.
Hclus.Single <- hclust(NDVI.Dist,
  method = "single"
)
# Plot Single classification.
plot(Hclus.Single,
  main = "normalized-euclidean\nSingle linkage"
)

# Complete linkage classification.
Hclus.Complete <- hclust(NDVI.Dist,
  method = "complete"
)
# Plot Complete classification.
plot(Hclus.Complete,
  main = "normalized-euclidean\nComplete linkage"
)

# Average linkage classification.
Hclus.Average <- hclust(NDVI.Dist,
  method = "average"
)
# Plot Average classification.
plot(Hclus.Average,
  main = "normalized-euclidean\nAverage linkage"
)

# Ward’s hierarchical classification.
Hclus.Ward <- hclust(NDVI.Dist,
  method = "ward.D2"
)
# Plot Ward’s hierarchical classification.
plot(Hclus.Ward,
  main = "normalized-euclidean\nWard.D"
)

The goal of any tree is that it represents the original dissimilarity matrix. To assess this you will look at the relation between original dissimilarities and cophenetic distances (that is the distance in the tree).

Your task:

For each of the four hierarchical clustering implementations estimate the relation between original euclidean distances, and the the cophenetic distances. For this, you will:

  1. Estimate the cophenetic distance for each tree using the cophenetic() function.
  2. Plot the relation between cophenetic distance and the euclidean distances estimated in NDVI.Dist.
  3. Estimate the Pearson’s r correlation between euclidean and each cophenetic distances.
# Single linkage classification cophenetic distance.
Cophe.Single <- cophenetic(Hclus.Single)
## Correlation Single classification cophenetic distance vs euclidean distances.
cor(
  y = Cophe.Single,
  x = NDVI.Dist
)
## [1] 0.6866919
# Plot Single classification cophenetic distance vs euclidean distances.
plot(
  y = Cophe.Single,
  x = NDVI.Dist,
  xlab = "Euclidean distance",
  ylab = "Cophenetic distance",
  pch = 19,
  main = c(
    "Single linkage",
    paste(
      "Cophenetic correlation =",
      round(cor(NDVI.Dist, Cophe.Single), 3)
    )
  )
)
abline(0, 1,
  lwd = 2,
  col = "red"
)

# Complete linkage classification cophenetic distance.
Cophe.Complete <- cophenetic(Hclus.Complete)
## Correlation Complete classification cophenetic distance vs euclidean distances.
cor(
  y = Cophe.Complete,
  x = NDVI.Dist
)
## [1] 0.7657518
# Plot Complete classification cophenetic distance vs euclidean distances.
plot(
  y = Cophe.Complete,
  x = NDVI.Dist,
  xlab = "Euclidean distance",
  ylab = "Cophenetic distance",
  pch = 19,
  main = c(
    "Complete linkage",
    paste(
      "Cophenetic correlation =",
      round(cor(NDVI.Dist, Cophe.Complete), 3)
    )
  )
)
abline(0, 1,
  lwd = 2,
  col = "red"
)

# Average linkage classification cophenetic distance.
Cophe.Average <- cophenetic(Hclus.Average)
## Correlation Average linkage classification cophenetic distance vs euclidean distances.
cor(
  y = Cophe.Average,
  x = NDVI.Dist
)
## [1] 0.7855293
# Plot Average linkage classification cophenetic distance vs euclidean distances.
plot(
  y = Cophe.Average,
  x = NDVI.Dist,
  xlab = "Euclidean distance",
  ylab = "Cophenetic distance",
  pch = 19,
  main = c(
    "Complete linkage",
    paste(
      "Cophenetic correlation =",
      round(cor(NDVI.Dist, Cophe.Average), 3)
    )
  )
)
abline(0, 1,
  lwd = 2,
  col = "red"
)

# Ward’s hierarchical classification cophenetic distance.
Cophe.Ward <- cophenetic(Hclus.Ward)
## Correlation Ward’s classification cophenetic distance vs euclidean distances.
cor(
  y = Cophe.Ward,
  x = NDVI.Dist
)
## [1] 0.7229159
# Plot Ward’s classification classification cophenetic distance vs euclidean distances.
plot(
  y = Cophe.Ward,
  x = NDVI.Dist,
  xlab = "Euclidean distance",
  ylab = "Cophenetic distance",
  pch = 19,
  main = c(
    "Ward’s classification",
    paste(
      "Cophenetic correlation =",
      round(cor(NDVI.Dist, Cophe.Ward), 3)
    )
  )
)
abline(0, 1,
  lwd = 2,
  col = "red"
)

Question: Which method is the best in describing the original distances? The Complete linkage classification was the best method to describe the original distances

Now that you have a best classification approach you will see which observations cluster together.

Your task:

For the best hierarchical clustering model you will pot the hierarchical model, and then plot over it the division into five clusters using the rect.hclust() function.

# Plot the best hierarchical clustering model
plot(Hclus.Average)
# plot over it boxes showing the splint into five clusters.
ClusClass <- rect.hclust(Hclus.Average,
  k = 5
)

# Check which observations where clustered in each class
ClusClass
## [[1]]
## [1] 37 80
## 
## [[2]]
##  [1]   4   5  16  18  23  28  31  33  44  47  48  49  55  58  70  74  79  82  83
## [20]  85  91  98 100
## 
## [[3]]
##  [1]  2  6  8 10 13 14 17 20 24 30 32 36 38 39 40 41 42 45 56 59 62 64 65 66 67
## [26] 71 72 76 77 78 81 87 88 94
## 
## [[4]]
##  [1]  3  7  9 12 15 19 21 25 26 27 34 35 46 60 92 93 95 97
## 
## [[5]]
##  [1]  1 11 22 29 43 50 51 52 53 54 57 61 63 68 69 73 75 84 86 89 90 96 99

The problem with the hierarchical clustering model you have created is that it only encompasses a handful of observations (distance matrices increase exponential in size as you add observations), and these can not be extrapolated into the full domain.

k-means Non-hierarchical clustering

In non-hierarchical clustering (called unsupervised classification in GIS applications), you use only predictors and do not supply any response data. In the context of satellite images, you only use reflectance data, and you do not identify any pixel as belonging to a particular class beforehand. This approach may seem odd, but it can be useful when you do not have much prior knowledge of a study area or a broad knowledge of the distribution of land cover classes of interest but no specific ground data. Here you will use k-means to groups pixels with similar spectral characteristics into groups.

Here you will perform unsupervised classification on a spatial subset of the NDVI RasterLayer. For this, you will use the k-means approach.

The first step to do your classification is to extract the NDVI raster and save these into a vector. This is because the classification function used to define classes work on a vector of values, NOT on the cells in a raster.

Your task:

Using the function getValues(), extract the values in the NDVI raster and save these into a vector.

To know how to use the function type ?getValues.

# Using the function `getValues()` extract the values in the `NDVI` into an object named NDVIVctv.
NDVIVct <- getValues(NDVI)

# Just to be on the safe side turn NA into -1
NDVIVct[is.na(NDVIVct)] <- -1

# Check the stricture of the vector with the raster data.
str(NDVIVct)
##  num [1:163676] 0.751 0.742 0.628 0.677 0.674 ...

Using the vector created with the test above, you can now perform a kmeans clustering and inspect the output. Remember that kmeans need a vector with NO empty cells as an input. That means NO NAs are allowed in the input vector. Also, you will need to specify the number of classes with the centers argument and the algorithm to estimate the groups. Here, you will use the Lloyd algorithm. If in doubt about how to use a function, remember that you look at the examples in the help file you get when typing ?means.

Your task:

Using the kmeans function, classify the vector named NDVIVct into five (5) classes using 500 iterations, ten random sets, and the "Lloyd" method.

# It is important to set the seed generator for repeatability because the `kmeans()` function initiates the centres in random locations.
set.seed(99)

# Impalement the kmeans() function with the criteria defined in the task:
# you want to create five clusters, allow 500 iterations, use ten random sets, and use "Lloyd" method
kmncluster <- kmeans(
  x = na.omit(NDVIVct), # numeric matrix of data to classify.
  centers = 5, # How many clusters?
  iter.max = 500, # Which is the maximum number of iterations allowed?
  nstart = 10, # How many random sets should be chosen?
  algorithm = "Lloyd" # Which algorithm to use when looking at the optimal solution.
)

# kmeans returns an object of class "kmeans"
str(kmncluster)
## List of 9
##  $ cluster     : int [1:163676] 1 1 5 1 1 5 5 1 1 5 ...
##  $ centers     : num [1:5, 1] 0.75 0.17 0.395 -0.273 0.579
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5] "1" "2" "3" "4" ...
##   .. ..$ : NULL
##  $ totss       : num 10573
##  $ withinss    : num [1:5] 133 176 119 322 117
##  $ tot.withinss: num 867
##  $ betweenss   : num 9706
##  $ size        : int [1:5] 50220 31323 34055 4211 43867
##  $ iter        : int 69
##  $ ifault      : NULL
##  - attr(*, "class")= chr "kmeans"

If you check the length NDVIVct and compare it to the size of the cluster slot in kmncluster, you will see these are are they the same.

Your task:

Compare the length of NDVIVct to the cluster slot size in kmncluster.

Are these the same length?

# length of `NDVIVct`
length(NDVIVct)
## [1] 163676
# length of the clusters vector
length(kmncluster$cluster)
## [1] 163676
# Compare the length of the input and outpout vectors?
length(NDVIVct) == length(kmncluster$cluster)
## [1] TRUE

Question: Are the input and output the same length? YES!! they are the lengths are the same.

The cell values of kmncluster$cluster indicate the cluster label for the corresponding pixel, which ranges between 1 to 5 (corresponding to the input number of cluster you provided in the kmeans function). You can now use this vector to convert the kmncluster$cluster values back to a RasterLayer of the same dimension as the NDVI. For this, you will use the setValues function. Look how this function is set up using ?setValues

Your task:

Use the setValues function to convert the kmncluster$cluster values back into a RasterLayer.

Plot the k-means classified NDVI RasterLayer using a highly contrasting colour scheme.

# Use the ndvi object to set the cluster values to a new raster
knr <- setValues(
  NDVI,
  kmncluster$cluster
)

# You can also do it like this
# Create an empty raster using the NDVI `RasterLayer`
knr <- raster(NDVI)

# Set the values to that empt raster
values(knr) <- kmncluster$cluster

# Plot the k-means classified NDVI using a highly contrasting colour scheme.
plot(knr, # The k-means RasterLayer.
  main = "K-means 5-class classifiction", # Main title.
  col = c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3", "#ff7f00") # A contrasting colour scheme.
)

You can see that knr is a RasterLayer, but you do not know which cluster (1-5) belongs to what land cover class (and if it does belong to a class that you would recognise).

Your task:

Plot the NDVI and knr side-by-side using a unique and contrasting colour for each cluster in knr.

Try to assess which might be vegetation and areas that are not?

par(mfrow = c(1, 2))
# Plot the NDVI values
plot(NDVI, # Object to plot.
  col = rev(terrain.colors(10)), # Which colours to use?
  main = "Landsat-NDVI" # Main title
)

# Plot the k-means classified NDVI using a highly contrasting colour scheme.
plot(knr, # The k-means RasterLayer.
  col = c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3", "#ff7f00"), # A contrasting colour scheme.
  main = "Unsupervised classification", # Main title
)

Analysing the classification output - Spectral profiles

To finish, look at the spectral profiles (that is, the average reflectance for a class) of each generated class. For this, you will plot the value for pixels representing a specific earth surface feature. Such profiles constitute the basis for image analysis. You can extract spectral values from any multi-spectral data set using the extract() function. Here you will use the classification just created and extract the values for each class.

To start, you will select 100 points within each class as this makes things run fast and standardise the analyses across classes (not all classes have the same number of cells). You will use the getValues() function to extract ALL the cells ID for each class in knr (that is, the location indexes in the RasterLayer), and the sample function to randomly select 100 points.

Your task:

Create a list with each slot being the random sample of points within a class.

Use the sample function to extract 100-random points from each class and store these in the list created above.

# It is important to set the seed generator for repeatability because the `kmeans()` function initiates the centres in random locations.
set.seed(25)

# Create a `list`  where each slot is a band
Knr.RndPnt <- list(
  Class1 = sample(which(getValues(knr) == 1), 100), # a random sample of 100 point for class 1.
  Class2 = sample(which(getValues(knr) == 2), 100), # a random sample of 100 point for class 2.
  Class3 = sample(which(getValues(knr) == 3), 100), # a random sample of 100 point for class 3.
  Class4 = sample(which(getValues(knr) == 4), 100), # a random sample of 100 point for class 4.
  Class5 = sample(which(getValues(knr) == 5), 100) # a random sample of 100 point for class 5.
)

# Using a lapply you can do this quicker
set.seed(25)
Knr.RndPnt <- lapply(
  1:5, # A vector of the class numbers/names
  function(i) { # A function to get the ID of 100 points by classification class.
    sample(which(getValues(knr) == i), 100)
  }
)

To extract the landsat8 and NDVIReclas values after selecting the 100-random points, you need to define the coordinates of these. For this, you will use the indexes generated above and sample the list of coordinates from the knr RasterLayer. A vital point to know is that applying the function coordinates() to a RasterLayer lists the centre coordinate of all the grids in that object.

Your task:

Get the coordinates for each random point selected above, and store these in a list.

# get the coordinates for each point selected in th step above
Knr.Rndcoord <- list(
  Class1 = coordinates(knr)[Knr.RndPnt[[1]], ], # Coordinates for class 1
  Class2 = coordinates(knr)[Knr.RndPnt[[2]], ], # Coordinates for class 2
  Class3 = coordinates(knr)[Knr.RndPnt[[3]], ], # Coordinates for class 3
  Class4 = coordinates(knr)[Knr.RndPnt[[4]], ], # Coordinates for class 4
  Class5 = coordinates(knr)[Knr.RndPnt[[5]], ] # Coordinates for class 5
)

# The same as above but shorter.
Knr.Rndcoord <- lapply(
  Knr.RndPnt, # The List with the 100 points by classification class.
  function(i) {
    coordinates(knr)[i, ] # Get the coordinates for the selected points.
  }
)

# in just one go both this and the task above
set.seed(25)
Knr.Rndcoord <- lapply(
  1:5,
  function(i) {
    coordinates(knr)[sample(which(getValues(knr) == i), 100), ]
  }
)

You can extract spectral values from any multi-spectral data set using the extract() function. So now extract the values for the six (6) bands in landsat8 and the NDVI values. Using these, generate a plot where the x-axis is the band number, and their value is the average for each band across all 100 points sampled.

Your task:

Get the data for each of the six bands in the landsat8 object and the NDVI values for the points selected in the task above.

Once you have extracted these values for each class, estimate the bands/NDVI means.

# get the bands for each point selected in the step above
SpectProf <- list(
  Class1 = cbind(
    extract(landsat8, Knr.Rndcoord[[1]]), # Get the six bands values.
    extract(NDVI, Knr.Rndcoord[[1]])
  ), # Get the NDVI values.
  Class2 = cbind(
    extract(landsat8, Knr.Rndcoord[[2]]), # Get the six bands values.
    extract(NDVI, Knr.Rndcoord[[2]])
  ), # Get the NDVI values.
  Class3 = cbind(
    extract(landsat8, Knr.Rndcoord[[3]]), # Get the six bands values.
    extract(NDVI, Knr.Rndcoord[[3]])
  ), # Get the NDVI values.
  Class4 = cbind(
    extract(landsat8, Knr.Rndcoord[[4]]), # Get the six bands values.
    extract(NDVI, Knr.Rndcoord[[4]])
  ), # Get the NDVI values.
  Class5 = cbind(
    extract(landsat8, Knr.Rndcoord[[5]]), # Get the six bands values.
    extract(NDVI, Knr.Rndcoord[[5]])
  ) # Get the NDVI values.
)

# The same as above but shorter.
SpectProf <- lapply(
  Knr.Rndcoord, # The list with the coordinates for the 100 points by classification class.
  function(i) { # Use the extract() function to get the Landsat bands and the NDVI values for the selected locations.
    cbind(
      extract(landsat8, i),
      extract(NDVI, i)
    )
  }
)

# Now estimate the bands/NDV means
SpectProf.Means <- lapply(
  SpectProf, #  The list with the bands/NDVI values for the 100 points by classification class.
  function(i) { # Estimate the Mean accords the selected 100 points.
    apply(i, 2, mean)
  }
)

Your task:

Now that the Mean-band values are estimated for each land class, you can plot the spectral profiles. Do this either as a line plot or a bar plot.

# Line plot.
# You start by creating an empty plot.
plot.new() # Create / Start a New Plot Frame
plot.window(ylim = c(-0.2, 0.6), xlim = c(1, 7)) # Define the plot space extent.
axis(1, # Add the x-axis.
  at = 1:7, # Where each tick is placed.
  labels = c(paste0("Band", 1:6), "NDVI") # Name of the bands.
)
axis(2, # Add the y-axis.
  las = 2 # make axis text horizontal?
)

# Add the x-axis label.
mtext("Bands", # x-axis label.
  side = 1, # On which side of the plot? (1=bottom, 2=left, 3=top, 4=right).
  outer = T, # Use outer margins?
  line = -2 # On which MARgin line add the text.
)

# Add the y-axis label.
mtext("Reflectance", # y-axis label.
  side = 2, # On which side of the plot? (1=bottom, 2=left, 3=top, 4=right).
  outer = T, # Use outer margins?
  line = -2 # On which MARgin line add the text.
)

# Now you add a line that show the mean values for a band across classes .
for (i in 1:5) {
  lines(SpectProf.Means[[i]], # The Mean value per band/NDVI.
    col = i, # Specify the colour to use.
    lwd = 2 # Specify line width.
  )
}
# A title to be neat
title(
  main = "Spectral Profile from Landsat",
  font.main = 2
)
# A Legend
legend("topleft",
  legend = paste("Class -", 1:5),
  cex = 0.8, col = 1:5,
  lty = 1, lwd = 3,
  bty = "n"
)

# as a Bar plot
# Turn the list into a table (Bands/NDVI are the variables)- use the do.call() function for this
SpectProf.Means2 <- do.call(
  "rbind", # what to do? (add as a string)
  SpectProf.Means # The List object on what to do the action
)

# Give the right names to the rows of the table
colnames(SpectProf.Means2) <- c(
  "Coast", "blue", "green", "red",
  "NIR", "SWIR1", "SWIR2", "NDVI"
)

# Use the barplot()
barplot(SpectProf.Means2, # What object to draw?
  beside = T, # How should the bas be plotted?
  col = rep(1:5, 7), # Define the colours.
  xlab = "Bands", # Define the x-axis legend.
  ylab = "Reflectance" # Define the y-axis legend.
)
# A title to be neat
title(
  main = "Spectral Profile from Landsat",
  font.main = 2
)
# Legend
legend("topleft", # Where to place the legend.
  legend = paste("Class -", 1:5), # The class names
  cex = 0.8, # Define the text size
  fill = 1:5, # Define the colours to fill
  bty = "n"
)

Final points

This has been a practical focus on working with satellite images as raster files and using classification techniques to group these into classes. There is much more that could be done here. For example, instead of defining the number of groups beforehand, you could assess a possible number of groups utilising a cluster analysis. I leave that the keen student.